N-gram Adaptation with Dynamic Interpolation Coefficient Using Information Retrieval Technique

نویسندگان

  • Joon-Ki Choi
  • Yung-Hwan Oh
چکیده

This study presents an N-gram adaptation technique when additional text data for the adaptation do not exist. We use a language modeling approach to the information retrieval (IR) technique to collect the appropriate adaptation corpus from baseline text data. We propose to use a dynamic interpolation coefficient to merge the N-gram, where the interpolation coefficient is estimated from the word hypotheses obtained by segmenting the input speech. Experimental results show that the proposed adapted N-gram always has better performance than the background Ngram. key words: language model adaptation, adaptation corpus, dynamic interpolation coefficient, speech recognition

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A class based approach to domain adaptation and constraint integration for empirical m-gram models

The rst class based adaptation approaches FGH + 97, Ueb97] take the use of classes in the construction of statistical m-gram models one signiicant step further than just using them as a smoothing technique: The m-gram of classes is trained on the large background corpus while the word likelihoods given the class are estimated on the small target corpus. To make full use of this technique a spec...

متن کامل

Comparison of s-gram Proximity Measures in Out-of-Vocabulary Word Translation

Classified s-grams have been successfully used in cross-language information retrieval (CLIR) as an approximate string matching technique for translating out-of-vocabulary (OOV) words. For example, s-grams have consistently outperformed other approximate string matching techniques, like edit distance or n-grams. The Jaccard coefficient has traditionally been used as an s-gram based string proxi...

متن کامل

The LIMSI 1999 Hub-4E Transcription System

In this paper we report on the LIMSI 1999 Hub-4E system for broadcast news transcription. The main difference from our previous broadcast news transcription system is that a new decoder was implemented to meet the 10xRT requirement. This single pass 4-gram dynamic network decoder is based on a time-synchronous Viterbi search with dynamic expansion of LM-state conditioned lexical trees, and with...

متن کامل

Semantic Text Clusters and Word Classes – the Dualism of Mutual Information and Maximum Likelihood

Dynamically modeling the word distribution in a variety of texts is a goal with various applications. For speech recognition a dynamic unigram may efficiently be used for the adaptation of longer ranging language models. For information retrieval it may be a good starting point to predict the most characteristic words in document dependent queries. This short paper presents two approaches for a...

متن کامل

Speech recognition of broadcast sports news

This paper shows that a domain-dependent language model and state-skipped HMMs can achieve improvements in word recognition accuracy on a broadcast sports news transcription task. Although a domain-dependent language model is much better than a general model in terms of word error rate, the smaller training corpus for a special topic relative to the general news corpus leads to problems especia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEICE Transactions

دوره 89-D  شماره 

صفحات  -

تاریخ انتشار 2006